Skip to content

ENH: clearer error msg for unequal categoricals in merge_asof (GH#26136) #26242

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

chrish42
Copy link
Contributor

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a test for this that ensures this works when they are "equal"?

@WillAyd WillAyd added Categorical Categorical Data Type Error Reporting Incorrect or improved errors from pandas Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Apr 29, 2019
@codecov
Copy link

codecov bot commented Apr 29, 2019

Codecov Report

Merging #26242 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26242      +/-   ##
==========================================
- Coverage   91.97%   91.96%   -0.01%     
==========================================
  Files         175      175              
  Lines       52368    52371       +3     
==========================================
  Hits        48164    48164              
- Misses       4204     4207       +3
Flag Coverage Δ
#multiple 90.52% <100%> (ø) ⬆️
#single 40.69% <0%> (-0.16%) ⬇️
Impacted Files Coverage Δ
pandas/core/reshape/merge.py 94.49% <100%> (+0.02%) ⬆️
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/frame.py 96.9% <0%> (-0.12%) ⬇️
pandas/util/testing.py 90.71% <0%> (+0.1%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9feb3ad...f89c29c. Read the comment docs.

@codecov
Copy link

codecov bot commented Apr 29, 2019

Codecov Report

Merging #26242 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26242      +/-   ##
==========================================
+ Coverage   91.97%   91.97%   +<.01%     
==========================================
  Files         175      175              
  Lines       52368    52389      +21     
==========================================
+ Hits        48164    48184      +20     
- Misses       4204     4205       +1
Flag Coverage Δ
#multiple 90.52% <100%> (ø) ⬆️
#single 40.72% <0%> (-0.13%) ⬇️
Impacted Files Coverage Δ
pandas/core/reshape/merge.py 94.49% <100%> (+0.02%) ⬆️
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/io/excel/_base.py 92.13% <0%> (-0.72%) ⬇️
pandas/core/indexes/datetimelike.py 98.13% <0%> (-0.4%) ⬇️
pandas/core/base.py 97.98% <0%> (-0.22%) ⬇️
pandas/plotting/_core.py 83.63% <0%> (-0.14%) ⬇️
pandas/core/frame.py 96.9% <0%> (-0.12%) ⬇️
pandas/core/generic.py 93.54% <0%> (ø) ⬆️
pandas/core/arrays/period.py 98.53% <0%> (ø) ⬆️
pandas/core/arrays/interval.py 93.06% <0%> (ø) ⬆️
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9feb3ad...464fa34. Read the comment docs.

@chrish42
Copy link
Contributor Author

Good question, actually. Yes, there is. It's test_basic_categorical (in the same file). But before I added the new test in this pull request, that was the only test that exercised the code paths of merge_asof for categoricals. I feel there should be more of those... That would be another pull request IMHO. though.

@WillAyd
Copy link
Member

WillAyd commented Apr 30, 2019

Thanks for clarifying. So this change does introduce unordered Categoricals which might be counter to what merge_asof is supposed to offer (the docstring explicitly states the join field must be ordered).

Do you you see any other instances of that method supporting unordered data? Wondering if we need to further require that Categoricals are ordered to use this method

@chrish42
Copy link
Contributor Author

@WillAyd Indeed, the merge keys should be ordered, otherwise merge_asof() makes no sense. Currently, the merge fails later (in _get_join_indexers()) with a ValueError. But fun fact, there is no test currently in test_merge_asof.py. I've added a test that undordered categorical merge keys raises.

I've also added code to raise a MergeError (instead of a ValueError further down the road). But maybe adding this error check in in _get_merge_keys() is not the right place, because by then, the by= keys have been added to the list of merge keys.

Is there a reliable way to skip over the by= keys in this new test? Or should I put this test in another method? Thanks!

@WillAyd
Copy link
Member

WillAyd commented Apr 30, 2019

Hmm that's unfortunate. If it makes things too complicated I'm not against looking that in more detail as a follow up if you can open an issue for it

@chrish42
Copy link
Contributor Author

Cool. I've reverted the last change, and added a comment to explain what's happening currently. I've created #26249 for the other cleanups.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening an issue for follow ups. I'm OK with this as is with cleanups to follow. Over to @jreback

@WillAyd WillAyd added this to the 0.25.0 milestone May 1, 2019
@jreback jreback merged commit c2c7939 into pandas-dev:master May 1, 2019
@jreback
Copy link
Contributor

jreback commented May 1, 2019

thanks @chrish42 nice patch!

@chrish42 chrish42 deleted the merge_asof-categorical-error-tweak branch May 2, 2019 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Error Reporting Incorrect or improved errors from pandas Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Confusing "MergeError: incompatible merge keys [1] category and category, must be the same type"
3 participants